Character sets of strings
نویسندگان
چکیده
Given a string S over a finite alphabet Σ, the character set (also called the fingerprint) of a substring S′ of S is the subset C ⊆ Σ of the symbols occurring in S′. The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or comparative genomics. Several queries about the character sets of a string arise from these applications, especially: (1) Output all the maximal locations of substrings having a given character set. (2) Output for each character set C occurring in a given string (or a given collection of strings) all the maximal locations of C. Denoting by n the total length of the considered string or collection of strings, we solve the first problem in Θ(n) time using Θ(n) space. We present two algorithms solving the second problem. The first one runs in Θ(n2) time using Θ(n) space. The second algorithm has Θ(n|Σ| log |Σ|) time and Θ(n) space complexity and is an adaptation of an algorithm by Amir et al. (J. Discr. Alg., 26:1–13, 2003).
منابع مشابه
Extraction of Character Strings from House Maps
In this paper, we propose an experimental extmction method of character strings from house map images, using tlie block information. Our method consists of two steps: the first is to recognize the block information, and the second is to extract character strings with respect to the recognized block i n f o n a tion. In comparison with urban maps, which have often been investigated for extractio...
متن کاملFinite state intensional semantics
Suppose possible worlds are strings, rather than physically structured worlds like ours. Then the proposition corresponding to a sentence or a formula in logical language is a set of strings; an epistemic acquaintance relation is a relation between strings; and in a relational construction of partition semantics for questions, a question meaning is a relation between strings. If discourse refer...
متن کاملAn example of design optimization for high evolvability: string rewriting grammar.
As an example of the optimization of an evolutionary system design, a string rewriting system is studied. A set of rewriting rules that defines the growth of a string is experimentarily optimized in terms of maximizing the 'replicative capacity', that is the occurrence ratio of self-replicating strings. It is shown that the most optimized rule set allows many strings to self-replicate by using ...
متن کاملString Distances and Uniformities
The Levenstein or edit distance was developed as a metric for calculating distances between character strings. We are looking at weighting the different edit operations (insertion, deletion, substitution) to obtain different types of classifications of sets of strings. As a more general and less constrained approach we introduce topological notions and in particular uniformities.
متن کاملDistance Based Indexing for String Proximity Search
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Discrete Algorithms
دوره 5 شماره
صفحات -
تاریخ انتشار 2007